stochastic domain
Leveraging Statistical Multi-Agent Online Planning with Emergent Value Function Approximation
Phan, Thomy, Belzner, Lenz, Gabor, Thomas, Schmid, Kyrill
Making decisions is a great challenge in distributed autonomous environments due to enormous state spaces and uncertainty. Many online planning algorithms rely on statistical sampling to avoid searching the whole state space, while still being able to make acceptable decisions. However, planning often has to be performed under strict computational constraints making online planning in multi-agent systems highly limited, which could lead to poor system performance, especially in stochastic domains. In this paper, we propose Emergent Value function Approximation for Distributed Environments (EVADE), an approach to integrate global experience into multi-agent online planning in stochastic domains to consider global effects during local planning. For this purpose, a value function is approximated online based on the emergent system behaviour by using methods of reinforcement learning. We empirically evaluated EVADE with two statistical multi-agent online planning algorithms in a highly complex and stochastic smart factory environment, where multiple agents need to process various items at a shared set of machines. Our experiments show that EVADE can effectively improve the performance of multi-agent online planning while offering efficiency w.r.t. the breadth and depth of the planning process.
Using Abstraction for Interpretable Robot Programs in Stochastic Domains
A robot's actions are inherently stochastic, as its sensors are noisy and its actions do not always have the intended effects. For this reason, the agent language Golog has been extended to models with degrees of belief and stochastic actions. While this allows more precise robot models, the resulting programs are much harder to comprehend, because they need to deal with the noise, e.g., by looping until some desired state has been reached with certainty, and because the resulting action traces consist of a large number of actions cluttered with sensor noise. To alleviate these issues, we propose to use abstraction. We define a high-level and nonstochastic model of the robot and then map the high-level model into the lower-level stochastic model. The resulting programs are much easier to understand, often do not require belief operators or loops, and produce much shorter action traces.
Bayesian Policy Search for Stochastic Domains
Tolpin, David, Zhou, Yuan, Yang, Hongseok
AI planning can be cast as inference in probabilistic models, and probabilistic programming was shown to be capable of policy search in partially observable domains. Prior work introduces policy search through Markov chain Monte Carlo in deterministic domains, as well as adapts black-box variational inference to stochastic domains, however not in the strictly Bayesian sense. In this work, we cast policy search in stochastic domains as a Bayesian inference problem and provide a scheme for encoding such problems as nested probabilistic programs. We argue that probabilistic programs for policy search in stochastic domains should involve nested conditioning, and provide an adaption of Lightweight Metropolis-Hastings (LMH) for robust inference in such programs. We apply the proposed scheme to stochastic domains and show that policies of similar quality are learned, despite a simpler and more general inference algorithm. We believe that the proposed variant of LMH is novel and applicable to a wider class of probabilistic programs with nested conditioning.
Learning Relational Dynamics of Stochastic Domains for Planning
Martínez, David (Institut de Robòtica i Informàtica Industrial (CSIC-UPC)) | Alenyà, Guillem (Institut de Robòtica i Informàtica Industrial (CSIC-UPC)) | Torras, Carme (Institut de Robòtica i Informàtica Industrial (CSIC-UPC)) | Ribeiro, Tony (IRCCyN, École Centrale de Nantes) | Inoue, Katsumi (National Institute of Informatics, Japan)
Probabilistic planners are very flexible tools that can provide good solutions for difficult tasks. However, they rely on a model of the domain, which may be costly to either hand code or automatically learn for complex tasks. We propose a new learning approach that (a) requires only a set of state transitions to learn the model; (b) can cope with uncertainty in the effects; (c) uses a relational representation to generalize over different objects; and (d) in addition to action effects, it can also learn exogenous effects that are not related to any action, e.g., moving objects, endogenous growth and natural development. The proposed learning approach combines a multi-valued variant of inductive logic programming for the generation of candidate models, with an optimization method to select the best set of planning operators to model a problem. Finally, experimental validation is provided that shows improvements over previous work.